#include <agmfit.h>

Collaboration diagram for TAGMFit:

Public Member Functions
	TAGMFit ()
	~TAGMFit ()
	TAGMFit (const PUNGraph &GraphPt, const TVec< TIntV > &CmtyVVPt, const int RndSeed=0)
	COMMENT. Use to describribe parameters.
	TAGMFit (const PUNGraph &GraphPt, const int InitComs, const int RndSeed=0)
	COMMENT. Use to describribe parameters.
	TAGMFit (const PUNGraph &GraphPt, const TVec< TIntV > &CmtyVVPt, const TRnd &RndPt)
	COMMENT. Use to describribe parameters.
void	Save (TSOut &SOut)
void	Load (TSIn &SIn, const int &RndSeed=0)
void	RandomInitCmtyVV (const int InitComs, const double ComSzAlpha=1.3, const double MemAlpha=1.8, const int MinComSz=8, const int MaxComSz=200, const int MinMem=1, const int MaxMem=10)
	Randomly initialize bipartite community affiliation graph.
void	AddBaseCmty ()
	Add Epsilon community (base community which includes all nodes) into community affiliation graph. This means that we will later fit the value of epsilon.
double	Likelihood ()
	COMMENT.
double	Likelihood (const TFltV &NewLambdaV)
double	Likelihood (const TFltV &NewLambdaV, double &LEdges, double &LNoEdges)
void	SetRegCoef (const double Val)
void	GetEdgeJointCom ()
	For each `(u, v)` in edges, precompute `C_uv` (the set of communities nodes u and v share).
void	NeighborComInit (const int InitComs)
	Initialize node community memberships using best neighborhood communities (see D. Gleich et al. KDD'12).
void	GradLogLForLambda (TFltV &GradV)
int	MLEGradAscentGivenCAG (const double &Thres=0.001, const int &MaxIter=10000, const TStr PlotNm=TStr())
	Gradient descent for `p_c` while keeping the community affiliation graph (CAG) fixed.
void	SetDefaultPNoCom ()
	Set Epsilon (the probability that two nodes sharing no communities connect) to the default value.
void	SetPNoCom (const double &Epsilon)
	Set Epsilon (the probability that two nodes sharing no communities connect) to the default value.
double	GetPNoCom ()
double	CalcPNoComByCmtyVV (const int &SamplePairs=-1)
	Compute the empirical edge probability between a pair of nodes who share no community (epsilon), based on current community affiliations.
void	GetNewtonStep (TFltVV &HVV, TFltV &GradV, TFltV &DeltaLV)
double	SelectLambdaSum (const TIntSet &ComK)
	Compute sum of `lambda_c` (which is log (1 - `p_c`)) over `C_uv` (`ComK`). The function is used to compute edge probability `P_uv`.
double	SelectLambdaSum (const TFltV &NewLambdaV, const TIntSet &ComK)
	COMMENT.
void	RandomInit (const int &MaxK)
	COMMENT.
void	RunMCMC (const int &MaxIter, const int &EvalLambdaIter, const TStr &PlotFPrx=TStr())
	Main procedure for fitting the AGM to a given graph using MCMC.
void	SampleTransition (int &NID, int &JoinCID, int &LeaveCID, double &DeltaL)
	Sample MMCM transitions: Choose among (`join, leave, switch`), and then sample (`NID, CID`).
void	InitNodeData ()
	COMMENT.
void	LeaveCom (const int &NID, const int &CID)
	After MCMC, `NID` leaves community `CID`.
void	JoinCom (const int &NID, const int &JoinCID)
int	RemoveEmptyCom ()
	Remove all communities with no members.
double	SeekLeave (const int &UID, const int &CID)
	Compute the change in likelihood (Delta) if node `UID` leaves community `CID`.
double	SeekJoin (const int &UID, const int &CID)
	Compute the change in likelihood (Delta) if node `UID` joins community `CID`.
double	SeekSwitch (const int &UID, const int &CurCID, const int &NewCID)
double	GetStepSizeByLineSearchForLambda (const TFltV &DeltaV, const TFltV &GradV, const double &Alpha, const double &Beta)
	Step size search for updating P_c (which is parametarized by regularization parameter lambda).
void	SetLambdaV (const TFltV &LambdaPt)
	COMMENT.
void	GetLambdaV (TFltV &OutV)
	COMMENT.
void	GetQV (TFltV &OutV)
	Returns `QV`, a vector of (1 - `p_c`) for each community `c`.
void	GetCmtyVV (TVec< TIntV > &CmtyVV, const double QMax=2.0)
	Get communities whose p_c is higher than 1 - `QMax`.
void	GetCmtyVV (TVec< TIntV > &CmtyVV, TFltV &QV, const double QMax=2.0)
	COMMENT.
void	SetCmtyVV (const TVec< TIntV > &CmtyVV)
	COMMENT.
void	PrintSummary ()
	COMMENT.
Private Attributes
PUNGraph	G
	Graph to fit.
TVec< TIntSet >	CIDNSetV
	Community ID -> Member Node ID Sets.
THash< TIntPr, TIntSet >	EdgeComVH
	Edge -> Shared Community ID Set.
THash< TInt, TIntSet >	NIDComVH
	Node ID -> Communitie IDs the node belongs to.
TIntV	ComEdgesV
	The number of edges in each community.
TFlt	PNoCom
	Probability of edge when two nodes share no community (epsilon in the paper).
TFltV	LambdaV
	Parametrization of P_c (edge probability in community c), P_c = 1 - exp(-lambda).
TRnd	Rnd
THash< TIntPr, TFlt >	NIDCIDPrH
	<Node ID, Community ID> pairs (for sampling MCMC moves).
THash< TIntPr, TInt >	NIDCIDPrS
	<Node ID, Community ID> pairs (for sampling MCMC moves).
TFlt	MinLambda
	Minimum value of regularization parameter lambda (default = 1e-5).
TFlt	MaxLambda
	Maximum value of regularization parameter lambda (default = 10).
TFlt	RegCoef
	Regularization parameter when we fit for P_c (for finding # communities).
TInt	BaseCID
	ID of the Epsilon-community (in case we fit P_c of the epsilon community). We do not fit for the Epsilon-community in general.

Detailed Description

Fitting the Affilialiton Graph Model (AGM).

Definition at line 8 of file agmfit.h.

Constructor & Destructor Documentation

TAGMFit::TAGMFit ( ) [inline]

Definition at line 26 of file agmfit.h.

{ }

TAGMFit::~TAGMFit ( ) [inline]

Definition at line 27 of file agmfit.h.

{ }

TAGMFit::TAGMFit	(	const PUNGraph &	GraphPt,
		const TVec< TIntV > &	CmtyVVPt,
		const int	RndSeed = `0`
	)		`[inline]`

COMMENT. Use to describribe parameters.

Definition at line 29 of file agmfit.h.

References SetCmtyVV().

: G(GraphPt), PNoCom(0.0), Rnd(RndSeed), MinLambda(0.00001), MaxLambda(10.0), RegCoef(0), BaseCID(-1) { SetCmtyVV(CmtyVVPt);  }

Here is the call graph for this function:

TAGMFit::TAGMFit	(	const PUNGraph &	GraphPt,
		const int	InitComs,
		const int	RndSeed = `0`
	)		`[inline]`

COMMENT. Use to describribe parameters.

Definition at line 31 of file agmfit.h.

References NeighborComInit().

: G(GraphPt), PNoCom(0.0), Rnd(RndSeed), MinLambda(0.00001), MaxLambda(10.0), RegCoef(0), BaseCID(-1) { NeighborComInit(InitComs); }//RandomInitCmtyVV(InitComs);  }

Here is the call graph for this function:

TAGMFit::TAGMFit	(	const PUNGraph &	GraphPt,
		const TVec< TIntV > &	CmtyVVPt,
		const TRnd &	RndPt
	)		`[inline]`

COMMENT. Use to describribe parameters.

Definition at line 33 of file agmfit.h.

References SetCmtyVV().

: G(GraphPt), PNoCom(0.0), Rnd(RndPt), MinLambda(0.00001), MaxLambda(10.0), RegCoef(0), BaseCID(-1) { SetCmtyVV(CmtyVVPt); }

Here is the call graph for this function:

Member Function Documentation

void TAGMFit::AddBaseCmty ( )

Add Epsilon community (base community which includes all nodes) into community affiliation graph. This means that we will later fit the value of epsilon.

Definition at line 251 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Add(), BaseCID, CIDNSetV, G, GetCmtyVV(), TUNGraph::GetNIdV(), IAssert, InitNodeData(), TVec< TVal, TSizeTy >::Len(), PNoCom, and SetCmtyVV().

                          {
  TVec<TIntV> CmtyVV;
  GetCmtyVV(CmtyVV);
  TIntV TmpV = CmtyVV[0];
  CmtyVV.Add(TmpV);
  G->GetNIdV(CmtyVV[0]);
  IAssert(CIDNSetV.Len() + 1 == CmtyVV.Len());
  SetCmtyVV(CmtyVV);
  InitNodeData();
  BaseCID = 0;
  PNoCom = 0.0;
}

Here is the call graph for this function:

double TAGMFit::CalcPNoComByCmtyVV ( const int & SamplePairs = -1 )

Compute the empirical edge probability between a pair of nodes who share no community (epsilon), based on current community affiliations.

Definition at line 632 of file agmfit.cpp.

References G, THash< TKey, TDat, THashFunc >::GetDat(), TAGMUtil::GetIntersection(), TUNGraph::GetNIdV(), TUNGraph::GetNodes(), TUInt64::GetStr(), TUNGraph::IsEdge(), TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), NIDComVH, PNoCom, and TFlt::Val.

                                                         {
  TIntV NIdV;
  G->GetNIdV(NIdV);
  uint64 PairNoCom = 0, EdgesNoCom = 0;
  for (int u = 0; u < NIdV.Len(); u++) {
    for (int v = u + 1; v < NIdV.Len(); v++) {
      int SrcNID = NIdV[u], DstNID = NIdV[v];
      TIntSet JointCom;
      TAGMUtil::GetIntersection(NIDComVH.GetDat(SrcNID),NIDComVH.GetDat(DstNID),JointCom);
      if(JointCom.Len() == 0) {
        PairNoCom++;
        if (G->IsEdge(SrcNID, DstNID)) { EdgesNoCom++; }
        if (SamplePairs > 0 && PairNoCom >= (uint64) SamplePairs) { break; }
      }
    }
    if (SamplePairs > 0 && PairNoCom >= (uint64) SamplePairs) { break; }
  }
  double DefaultVal = 1.0 / (double)G->GetNodes() / (double)G->GetNodes();
  if (EdgesNoCom > 0) {
    PNoCom = (double) EdgesNoCom / (double) PairNoCom;
  } else {
    PNoCom = DefaultVal;
  }
  printf("%s / %s edges without joint com detected (PNoCom = %f)\n", TUInt64::GetStr(EdgesNoCom).CStr(), TUInt64::GetStr(PairNoCom).CStr(), PNoCom.Val);
  return PNoCom;
}

Here is the call graph for this function:

void TAGMFit::GetCmtyVV	(	TVec< TIntV > &	CmtyVV,
		const double	QMax = `2.0`
	)

Get communities whose p_c is higher than 1 - QMax.

Definition at line 554 of file agmfit.cpp.

Referenced by AddBaseCmty(), and TAGMUtil::FindComsByAGM().

                                                              {
  TFltV TmpQV;
  GetCmtyVV(CmtyVV, TmpQV, QMax);
}

Here is the caller graph for this function:

void TAGMFit::GetCmtyVV	(	TVec< TIntV > &	CmtyVV,
		TFltV &	QV,
		const double	QMax = `2.0`
	)

COMMENT.

Definition at line 559 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Add(), THash< TKey, TDat, THashFunc >::AddDat(), BaseCID, CIDNSetV, G, TVec< TVal, TSizeTy >::Gen(), TUNGraph::GetNodes(), IAssert, LambdaV, TVec< TVal, TSizeTy >::Len(), and MinLambda.

                                                                         {
  CmtyVV.Gen(CIDNSetV.Len(), 0);
  QV.Gen(CIDNSetV.Len(), 0);
  TIntFltH CIDLambdaH(CIDNSetV.Len());
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    CIDLambdaH.AddDat(c, LambdaV[c]);
  }
  CIDLambdaH.SortByDat(false);
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    int CID = CIDLambdaH.GetKey(c);
    IAssert(LambdaV[CID] >= MinLambda);
    double Q = exp( - (double) LambdaV[CID]);
    if (Q > QMax) { continue; }
    TIntV CmtyV;
    CIDNSetV[CID].GetKeyV(CmtyV);
    if (CmtyV.Len() == 0) { continue; }
    if (CID == BaseCID) { //if the community is the base community(epsilon community), discard
      IAssert(CmtyV.Len() == G->GetNodes());
    } else {
      CmtyVV.Add(CmtyV);
      QV.Add(Q);
    }
  }
}

Here is the call graph for this function:

void TAGMFit::GetEdgeJointCom ( )

For each (u, v) in edges, precompute C_uv (the set of communities nodes u and v share).

Definition at line 50 of file agmfit.cpp.

References THash< TKey, TDat, THashFunc >::AddDat(), TUNGraph::BegNI(), CIDNSetV, ComEdgesV, EdgeComVH, TUNGraph::EndNI(), G, THash< TKey, TDat, THashFunc >::Gen(), TVec< TVal, TSizeTy >::Gen(), THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::GetEdges(), TAGMUtil::GetIntersection(), IAssert, THash< TKey, TDat, THashFunc >::IsKey(), THash< TKey, TDat, THashFunc >::Len(), TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), and NIDComVH.

Referenced by InitNodeData().

                              {
  ComEdgesV.Gen(CIDNSetV.Len());
  EdgeComVH.Gen(G->GetEdges());
  for (TUNGraph::TNodeI SrcNI = G->BegNI(); SrcNI < G->EndNI(); SrcNI++) {
    int SrcNID = SrcNI.GetId();
    for (int v = 0; v < SrcNI.GetDeg(); v++) {
      int DstNID = SrcNI.GetNbrNId(v);
      if (SrcNID >= DstNID) { continue; }
      TIntSet JointCom;
      IAssert(NIDComVH.IsKey(SrcNID));
      IAssert(NIDComVH.IsKey(DstNID));
      TAGMUtil::GetIntersection(NIDComVH.GetDat(SrcNID), NIDComVH.GetDat(DstNID), JointCom);
      EdgeComVH.AddDat(TIntPr(SrcNID,DstNID),JointCom);
      for (int k = 0; k < JointCom.Len(); k++) {
        ComEdgesV[JointCom[k]]++;
      }
    }
  }
  IAssert(EdgeComVH.Len() == G->GetEdges());
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::GetLambdaV ( TFltV & OutV ) [inline]

COMMENT.

Definition at line 93 of file agmfit.h.

References LambdaV.

{OutV = LambdaV;}

void TAGMFit::GetNewtonStep	(	TFltVV &	HVV,
		TFltV &	GradV,
		TFltV &	DeltaLV
	)

double TAGMFit::GetPNoCom ( ) [inline]

Definition at line 58 of file agmfit.h.

References PNoCom.

{ return PNoCom; }

void TAGMFit::GetQV ( TFltV & OutV )

Returns QV, a vector of (1 - p_c) for each community c.

Definition at line 451 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Gen(), LambdaV, and TVec< TVal, TSizeTy >::Len().

                               {
  OutV.Gen(LambdaV.Len());
  for (int i = 0; i < LambdaV.Len(); i++) {
    OutV[i] = exp(- LambdaV[i]);
  }
}

Here is the call graph for this function:

double TAGMFit::GetStepSizeByLineSearchForLambda	(	const TFltV &	DeltaV,
		const TFltV &	GradV,
		const double &	Alpha,
		const double &	Beta
	)

Step size search for updating P_c (which is parametarized by regularization parameter lambda).

Definition at line 109 of file agmfit.cpp.

References TLinAlg::DotProduct(), IAssert, LambdaV, TVec< TVal, TSizeTy >::Len(), Likelihood(), MaxLambda, and MinLambda.

Referenced by MLEGradAscentGivenCAG().

                                                                                                                                 {
  double StepSize = 1.0;
  double InitLikelihood = Likelihood();
  IAssert(LambdaV.Len() == DeltaV.Len());
  TFltV NewLambdaV(LambdaV.Len());
  for (int iter = 0; ; iter++) {
    for (int i = 0; i < LambdaV.Len(); i++) {
      NewLambdaV[i] = LambdaV[i] + StepSize * DeltaV[i];
      if (NewLambdaV[i] < MinLambda) { NewLambdaV[i] = MinLambda; }
      if (NewLambdaV[i] > MaxLambda) { NewLambdaV[i] = MaxLambda; }
    }
    if (Likelihood(NewLambdaV) < InitLikelihood + Alpha * StepSize * TLinAlg::DotProduct(GradV, DeltaV)) {
      StepSize *= Beta;
    } else {
      break;
    }
  }
  return StepSize;
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::GradLogLForLambda ( TFltV & GradV )

Definition at line 595 of file agmfit.cpp.

References THashSet< TKey, THashFunc >::BegI(), CIDNSetV, ComEdgesV, EdgeComVH, THashSet< TKey, THashFunc >::EndI(), TVec< TVal, TSizeTy >::Gen(), LambdaV, THash< TKey, TDat, THashFunc >::Len(), TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), PNoCom, RegCoef, and SelectLambdaSum().

Referenced by MLEGradAscentGivenCAG().

                                            {
  GradV.Gen(LambdaV.Len());
  TFltV SumEdgeProbsV(LambdaV.Len());
  for (int e = 0; e < EdgeComVH.Len(); e++) {
    TIntSet& JointCom = EdgeComVH[e];
    double LambdaSum = SelectLambdaSum(JointCom);
    double Puv = 1 - exp(- LambdaSum);
    if (JointCom.Len() == 0) {  Puv = PNoCom;  }
    for (TIntSet::TIter SI = JointCom.BegI(); SI < JointCom.EndI(); SI++) {
      SumEdgeProbsV[SI.GetKey()] += (1 - Puv) / Puv;
    }
  }
  for (int k = 0; k < LambdaV.Len(); k++) {
    int MaxEk = CIDNSetV[k].Len() * (CIDNSetV[k].Len() - 1) / 2;
    int NotEdgesInCom = MaxEk - ComEdgesV[k];
    GradV[k] = SumEdgeProbsV[k] - (double) NotEdgesInCom;
    if (LambdaV[k] > 0.0 && RegCoef > 0.0) { //if regularization exists
      GradV[k] -= RegCoef;
    }
  }
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::InitNodeData ( )

COMMENT.

Definition at line 264 of file agmfit.cpp.

References THash< TKey, TDat, THashFunc >::AddDat(), THash< TKey, TDat, THashFunc >::AddKey(), TUNGraph::BegNI(), CIDNSetV, ComEdgesV, TSnap::DelSelfEdges(), TVec< TVal, TSizeTy >::EndI(), TUNGraph::EndNI(), G, THash< TKey, TDat, THashFunc >::Gen(), TVec< TVal, TSizeTy >::Gen(), GetEdgeJointCom(), TAGMUtil::GetNodeMembership(), TUNGraph::GetNodes(), LambdaV, TVec< TVal, TSizeTy >::Len(), MaxLambda, MinLambda, NIDCIDPrS, and NIDComVH.

Referenced by AddBaseCmty(), NeighborComInit(), RandomInit(), RunMCMC(), and SetCmtyVV().

                           {
  TSnap::DelSelfEdges(G);
  NIDComVH.Gen(G->GetNodes());
  for (TUNGraph::TNodeI NI = G->BegNI(); NI < G->EndNI(); NI++) {
    NIDComVH.AddDat(NI.GetId());
  }
  TAGMUtil::GetNodeMembership(NIDComVH, CIDNSetV);
  GetEdgeJointCom();
  LambdaV.Gen(CIDNSetV.Len());
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    int MaxE = (CIDNSetV[c].Len()) * (CIDNSetV[c].Len() - 1) / 2;
    if (MaxE < 2) {
      LambdaV[c] = MaxLambda;
    }
    else{
      LambdaV[c] = -log((double) (MaxE - ComEdgesV[c]) / MaxE);
    }
    if (LambdaV[c] > MaxLambda) {  LambdaV[c] = MaxLambda;  }
    if (LambdaV[c] < MinLambda) {  LambdaV[c] = MinLambda;  }
  }
  NIDCIDPrS.Gen(G->GetNodes() * 10);
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    for (TIntSet::TIter SI = CIDNSetV[c].BegI(); SI < CIDNSetV[c].EndI(); SI++) {
      NIDCIDPrS.AddKey(TIntPr(SI.GetKey(), c));
    }
  }
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::JoinCom	(	const int &	NID,
		const int &	JoinCID
	)

Definition at line 309 of file agmfit.cpp.

References THash< TKey, TDat, THashFunc >::AddKey(), THashSet< TKey, THashFunc >::AddKey(), CIDNSetV, ComEdgesV, EdgeComVH, G, THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), THashSet< TKey, THashFunc >::IsKey(), TMath::Mn(), TMath::Mx(), NIDCIDPrS, and NIDComVH.

Referenced by RunMCMC().

                                                        {
  TUNGraph::TNodeI NI = G->GetNI(NID);
  for (int e = 0; e < NI.GetDeg(); e++) {
    int VID = NI.GetNbrNId(e);
    if (NIDComVH.GetDat(VID).IsKey(JoinCID)) {
      TIntPr SrcDstNIDPr = TIntPr(TMath::Mn(NID,VID), TMath::Mx(NID,VID));
      EdgeComVH.GetDat(SrcDstNIDPr).AddKey(JoinCID);
      ComEdgesV[JoinCID]++;
    }
  }
  CIDNSetV[JoinCID].AddKey(NID);
  NIDComVH.GetDat(NID).AddKey(JoinCID);
  NIDCIDPrS.AddKey(TIntPr(NID, JoinCID));
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::LeaveCom	(	const int &	NID,
		const int &	CID
	)

After MCMC, NID leaves community CID.

Definition at line 293 of file agmfit.cpp.

References CIDNSetV, ComEdgesV, THash< TKey, TDat, THashFunc >::DelKey(), THashSet< TKey, THashFunc >::DelKey(), EdgeComVH, G, THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), THashSet< TKey, THashFunc >::IsKey(), TMath::Mn(), TMath::Mx(), NIDCIDPrS, and NIDComVH.

Referenced by RunMCMC().

                                                     {
  TUNGraph::TNodeI NI = G->GetNI(NID);
  for (int e = 0; e < NI.GetDeg(); e++) {
    int VID = NI.GetNbrNId(e);
    if (NIDComVH.GetDat(VID).IsKey(CID)) {
      TIntPr SrcDstNIDPr = TIntPr(TMath::Mn(NID,VID), TMath::Mx(NID,VID));
      EdgeComVH.GetDat(SrcDstNIDPr).DelKey(CID);
      ComEdgesV[CID]--;
    }
  }
  CIDNSetV[CID].DelKey(NID);
  NIDComVH.GetDat(NID).DelKey(CID);
  NIDCIDPrS.DelKey(TIntPr(NID, CID));
}

Here is the call graph for this function:

Here is the caller graph for this function:

double TAGMFit::Likelihood ( )

COMMENT.

Definition at line 104 of file agmfit.cpp.

References LambdaV.

Referenced by TAGMUtil::FindComsByAGM(), GetStepSizeByLineSearchForLambda(), MLEGradAscentGivenCAG(), PrintSummary(), and RunMCMC().

                           { 
  return Likelihood(LambdaV); 
}

Here is the caller graph for this function:

double TAGMFit::Likelihood ( const TFltV & NewLambdaV ) [inline]

Definition at line 43 of file agmfit.h.

References Likelihood().

Referenced by Likelihood().

{ double Tmp1, Tmp2; return Likelihood(NewLambdaV, Tmp1, Tmp2); }

Here is the call graph for this function:

Here is the caller graph for this function:

double TAGMFit::Likelihood	(	const TFltV &	NewLambdaV,
		double &	LEdges,
		double &	LNoEdges
	)

Definition at line 76 of file agmfit.cpp.

References CIDNSetV, ComEdgesV, EdgeComVH, IAssert, THash< TKey, TDat, THashFunc >::Len(), TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), TFlt::Mn, PNoCom, RegCoef, SelectLambdaSum(), and TLinAlg::SumVec().

                                                                                    {
  IAssert(CIDNSetV.Len() == NewLambdaV.Len());
  IAssert(ComEdgesV.Len() == CIDNSetV.Len());
  LEdges = 0.0; LNoEdges = 0.0;
  for (int e = 0; e < EdgeComVH.Len(); e++) {
    TIntSet& JointCom = EdgeComVH[e];
    double LambdaSum = SelectLambdaSum(NewLambdaV, JointCom);
    double Puv = 1 - exp(- LambdaSum);
    if (JointCom.Len() == 0) {  Puv = PNoCom;  }
    IAssert(! _isnan(log(Puv)));
    LEdges += log(Puv);
  }
  for (int k = 0; k < NewLambdaV.Len(); k++) {
    int MaxEk = CIDNSetV[k].Len() * (CIDNSetV[k].Len() - 1) / 2;
    int NotEdgesInCom = MaxEk - ComEdgesV[k];
    if(NotEdgesInCom > 0) {
      if (LNoEdges >= TFlt::Mn + double(NotEdgesInCom) * NewLambdaV[k]) { 
        LNoEdges -= double(NotEdgesInCom) * NewLambdaV[k];
      }
    }
  }
  double LReg = 0.0;
  if (RegCoef > 0.0) {
    LReg = - RegCoef * TLinAlg::SumVec(NewLambdaV);
  }
  return LEdges + LNoEdges + LReg;
}

Here is the call graph for this function:

void TAGMFit::Load	(	TSIn &	SIn,
		const int &	RndSeed = `0`
	)

Definition at line 24 of file agmfit.cpp.

References BaseCID, CIDNSetV, ComEdgesV, EdgeComVH, G, LambdaV, THash< TKey, TDat, THashFunc >::Load(), TVec< TVal, TSizeTy >::Load(), TInt::Load(), TFlt::Load(), MaxLambda, MinLambda, NIDCIDPrH, NIDCIDPrS, NIDComVH, PNoCom, TRnd::PutSeed(), RegCoef, and Rnd.

                                                {
  G = TUNGraph::Load(SIn);
  CIDNSetV.Load(SIn);
  EdgeComVH.Load(SIn);
  NIDComVH.Load(SIn);
  ComEdgesV.Load(SIn);
  PNoCom.Load(SIn);
  LambdaV.Load(SIn);
  NIDCIDPrH.Load(SIn);
  NIDCIDPrS.Load(SIn);
  MinLambda.Load(SIn);
  MaxLambda.Load(SIn);
  RegCoef.Load(SIn);
  BaseCID.Load(SIn);
  Rnd.PutSeed(RndSeed);
}

Here is the call graph for this function:

int TAGMFit::MLEGradAscentGivenCAG	(	const double &	Thres = `0.001`,
		const int &	MaxIter = `10000`,
		const TStr	PlotNm = `TStr()`
	)

Gradient descent for p_c while keeping the community affiliation graph (CAG) fixed.

Definition at line 130 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Add(), TStr::Empty(), G, TUNGraph::GetEdges(), GetStepSizeByLineSearchForLambda(), TExeTm::GetTmStr(), GradLogLForLambda(), Kilo, LambdaV, TVec< TVal, TSizeTy >::Len(), Likelihood(), MaxLambda, MinLambda, TLinAlg::Norm(), and TGnuPlot::PlotValV().

Referenced by TAGMUtil::FindComsByAGM(), and RunMCMC().

                                                                                             {
  int Edges = G->GetEdges();
  TExeTm ExeTm;
  TFltV GradV(LambdaV.Len());
  int iter = 0;
  TIntFltPrV IterLV, IterGradNormV;
  double GradCutOff = 1000;
  for (iter = 0; iter < MaxIter; iter++) {
    GradLogLForLambda(GradV);    //if gradient is going out of the boundary, cut off
    for (int i = 0; i < LambdaV.Len(); i++) {
      if (GradV[i] < -GradCutOff) { GradV[i] = -GradCutOff; }
      if (GradV[i] > GradCutOff) { GradV[i] = GradCutOff; }
      if (LambdaV[i] <= MinLambda && GradV[i] < 0) { GradV[i] = 0.0; }
      if (LambdaV[i] >= MaxLambda && GradV[i] > 0) { GradV[i] = 0.0; }
    }
    double Alpha = 0.15, Beta = 0.2;
    if (Edges > Kilo(100)) { Alpha = 0.00015; Beta = 0.3;}
    double LearnRate = GetStepSizeByLineSearchForLambda(GradV, GradV, Alpha, Beta);
    if (TLinAlg::Norm(GradV) < Thres) { break; }
    for (int i = 0; i < LambdaV.Len(); i++) {
      double Change = LearnRate * GradV[i];
      LambdaV[i] += Change;
      if(LambdaV[i] < MinLambda) { LambdaV[i] = MinLambda;}
      if(LambdaV[i] > MaxLambda) { LambdaV[i] = MaxLambda;}
    }
    if (! PlotNm.Empty()) {
      double L = Likelihood();
      IterLV.Add(TIntFltPr(iter, L));
      IterGradNormV.Add(TIntFltPr(iter, TLinAlg::Norm(GradV)));
    }
  }
  if (! PlotNm.Empty()) {
    TGnuPlot::PlotValV(IterLV, PlotNm + ".likelihood_Q");
    TGnuPlot::PlotValV(IterGradNormV, PlotNm + ".gradnorm_Q");
    printf("MLE for Lambda completed with %d iterations(%s)\n",iter,ExeTm.GetTmStr());
  }
  return iter;
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::NeighborComInit ( const int InitComs )

Initialize node community memberships using best neighborhood communities (see D. Gleich et al. KDD'12).

Definition at line 186 of file agmfit.cpp.

References CIDNSetV, G, TVec< TVal, TSizeTy >::Gen(), TAGMUtil::GetConductance(), TUNGraph::TNodeI::GetDeg(), TUNGraph::GetEdges(), TUNGraph::TNodeI::GetId(), TAGMUtil::GetNbhCom(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), TUNGraph::GetNIdV(), TUNGraph::GetNodes(), TUNGraph::GetRndNI(), TExeTm::GetTmStr(), IAssert, InitNodeData(), TVec< TVal, TSizeTy >::Len(), and SetDefaultPNoCom().

Referenced by TAGMFit().

                                                {
  CIDNSetV.Gen(InitComs);
  const int Edges = G->GetEdges();
  TFltIntPrV NIdPhiV(G->GetNodes(), 0);
  TIntSet InvalidNIDS(G->GetNodes());
  TIntV ChosenNIDV(InitComs, 0); //FOR DEBUG
  TExeTm RunTm;
  //compute conductance of neighborhood community
  TIntV NIdV;
  G->GetNIdV(NIdV);
  for (int u = 0; u < NIdV.Len(); u++) {
    TIntSet NBCmty(G->GetNI(NIdV[u]).GetDeg() + 1);
    double Phi;
    if (G->GetNI(NIdV[u]).GetDeg() < 5) { //do not include nodes with too few degree
      Phi = 1.0; 
    } else {
      TAGMUtil::GetNbhCom(G, NIdV[u], NBCmty);
      IAssert(NBCmty.Len() == G->GetNI(NIdV[u]).GetDeg() + 1);
      Phi = TAGMUtil::GetConductance(G, NBCmty, Edges);
    }
    NIdPhiV.Add(TFltIntPr(Phi, NIdV[u]));
  }
  NIdPhiV.Sort(true);
  printf("conductance computation completed [%s]\n", RunTm.GetTmStr());
  fflush(stdout);
  //choose nodes with local minimum in conductance
  int CurCID = 0;
  for (int ui = 0; ui < NIdPhiV.Len(); ui++) {
    int UID = NIdPhiV[ui].Val2;
    fflush(stdout);
    if (InvalidNIDS.IsKey(UID)) { continue; }
    ChosenNIDV.Add(UID); //FOR DEBUG
    //add the node and its neighbors to the current community
    CIDNSetV[CurCID].AddKey(UID);
    TUNGraph::TNodeI NI = G->GetNI(UID);
    fflush(stdout);
    for (int e = 0; e < NI.GetDeg(); e++) {
      CIDNSetV[CurCID].AddKey(NI.GetNbrNId(e));
    }
    //exclude its neighbors from the next considerations
    for (int e = 0; e < NI.GetDeg(); e++) {
      InvalidNIDS.AddKey(NI.GetNbrNId(e));
    }
    CurCID++;
    fflush(stdout);
    if (CurCID >= InitComs) { break;  }
  }
  if (InitComs > CurCID) {
    printf("%d communities needed to fill randomly\n", InitComs - CurCID);
  }
  //assign a member to zero-member community (if any)
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    if (CIDNSetV[c].Len() == 0) {
      int ComSz = 10;
      for (int u = 0; u < ComSz; u++) {
        int UID = G->GetRndNI().GetId();
        CIDNSetV[c].AddKey(UID);
      }
    }
  }
  InitNodeData();
  SetDefaultPNoCom();
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::PrintSummary ( )

COMMENT.

Definition at line 659 of file agmfit.cpp.

References THash< TKey, TDat, THashFunc >::AddDat(), CIDNSetV, ComEdgesV, LambdaV, THash< TKey, TDat, THashFunc >::Len(), TVec< TVal, TSizeTy >::Len(), Likelihood(), NIDCIDPrS, PNoCom, and TFlt::Val.

                           {
  TIntFltH CIDLambdaH(CIDNSetV.Len());
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    CIDLambdaH.AddDat(c, LambdaV[c]);
  }
  CIDLambdaH.SortByDat(false);
  int Coms = 0;
  for (int i = 0; i < LambdaV.Len(); i++) {
    int CID = CIDLambdaH.GetKey(i);
    if (LambdaV[CID] <= 0.0001) { continue; }
    printf("P_c : %.3f Com Sz: %d, Total Edges inside: %d \n", 1.0 - exp(- LambdaV[CID]), CIDNSetV[CID].Len(), (int) ComEdgesV[CID]);
    Coms++;
  }
  printf("%d Communities, Total Memberships = %d, Likelihood = %.2f, Epsilon = %f\n", Coms, NIDCIDPrS.Len(), Likelihood(), PNoCom.Val);
}

Here is the call graph for this function:

void TAGMFit::RandomInit ( const int & MaxK )

COMMENT.

Definition at line 169 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Add(), THashSet< TKey, THashFunc >::AddKey(), CIDNSetV, TVec< TVal, TSizeTy >::Clr(), G, TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetId(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), TRnd::GetUniDevInt(), InitNodeData(), TVec< TVal, TSizeTy >::Last(), Rnd, and SetDefaultPNoCom().

                                        {
  CIDNSetV.Clr();
  for (int c = 0; c < MaxK; c++) {
    CIDNSetV.Add();
    int NC = Rnd.GetUniDevInt(G -> GetNodes());
    TUNGraph::TNodeI NI = G -> GetRndNI();
    CIDNSetV.Last().AddKey(NI.GetId());
    for (int v = 0; v < NC; v++) {
      NI = G->GetNI(NI.GetNbrNId(Rnd.GetUniDevInt(NI.GetDeg())));
      CIDNSetV.Last().AddKey(NI.GetId());
    }
  }
  InitNodeData();
  SetDefaultPNoCom();
}

Here is the call graph for this function:

void TAGMFit::RandomInitCmtyVV	(	const int	InitComs,
		const double	ComSzAlpha = `1.3`,
		const double	MemAlpha = `1.8`,
		const int	MinComSz = `8`,
		const int	MaxComSz = `200`,
		const int	MinMem = `1`,
		const int	MaxMem = `10`
	)

Randomly initialize bipartite community affiliation graph.

Definition at line 42 of file agmfit.cpp.

References G, TAGMUtil::GenCmtyVVFromPL(), TUNGraph::GetNodes(), Rnd, and SetCmtyVV().

                                                                                                                                                                             {
  TVec<TIntV> InitCmtyVV(InitComs, 0);
  TAGMUtil::GenCmtyVVFromPL(InitCmtyVV, G, G->GetNodes(), InitComs, ComSzAlpha, MemAlpha, MinComSz, MaxComSz,
      MinMem, MaxMem, Rnd);
  SetCmtyVV(InitCmtyVV);
}

Here is the call graph for this function:

int TAGMFit::RemoveEmptyCom ( )

Remove all communities with no members.

Definition at line 459 of file agmfit.cpp.

References CIDNSetV, TVec< TVal, TSizeTy >::DelLast(), LambdaV, TVec< TVal, TSizeTy >::Last(), and TVec< TVal, TSizeTy >::Len().

                            {
  int DelCnt = 0;
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    if (CIDNSetV[c].Len() == 0) {
      CIDNSetV[c] = CIDNSetV.Last();
      CIDNSetV.DelLast();
      LambdaV[c] = LambdaV.Last();
      LambdaV.DelLast();
      DelCnt++;
      c--;
    }
  }
  return DelCnt;
}

Here is the call graph for this function:

void TAGMFit::RunMCMC	(	const int &	MaxIter,
		const int &	EvalLambdaIter,
		const TStr &	PlotFPrx = `TStr()`
	)

Main procedure for fitting the AGM to a given graph using MCMC.

MCMC fitting.

Definition at line 366 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Add(), TGnuPlot::AddPlot(), BaseCID, CIDNSetV, TStr::Fmt(), G, TUNGraph::GetEdges(), TUNGraph::GetNodes(), TExeTm::GetTmStr(), TRnd::GetUniDev(), gpwLinesPoints, InitNodeData(), JoinCom(), LambdaV, LeaveCom(), TStr::Len(), TVec< TVal, TSizeTy >::Len(), Likelihood(), MLEGradAscentGivenCAG(), TMath::Mx(), Rnd, SampleTransition(), TGnuPlot::SavePng(), TGnuPlot::SetDataPlotFNm(), TGnuPlot::SetTitle(), and TExeTm::Tick().

Referenced by TAGMUtil::FindComsByAGM().

                                                                                         {
  TExeTm IterTm, TotalTm;
  double PrevL = Likelihood(), DeltaL = 0;
  double BestL = PrevL;
  printf("initial likelihood = %f\n",PrevL);
  TIntFltPrV IterTrueLV, IterJoinV, IterLeaveV, IterAcceptV, IterSwitchV, IterLBV;
  TIntPrV IterTotMemV;
  TIntV IterV;
  TFltV BestLV;
  TVec<TIntSet> BestCmtySetV;
  int SwitchCnt = 0, LeaveCnt = 0, JoinCnt = 0, AcceptCnt = 0, ProbBinSz;
  int Nodes = G->GetNodes(), Edges = G->GetEdges();
  TExeTm PlotTm;
  ProbBinSz = TMath::Mx(1000, G->GetNodes() / 10); //bin to compute probabilities
  IterLBV.Add(TIntFltPr(1, BestL));

  for (int iter = 0; iter < MaxIter; iter++) {
    IterTm.Tick();
    int NID = -1;
    int JoinCID = -1, LeaveCID = -1;
    SampleTransition(NID, JoinCID, LeaveCID, DeltaL); //sample a move
    double OptL = PrevL;
    if (DeltaL > 0 || Rnd.GetUniDev() < exp(DeltaL)) { //if it is accepted
      IterTm.Tick();
      if (LeaveCID > -1 && LeaveCID != BaseCID) { LeaveCom(NID, LeaveCID); }
      if (JoinCID > -1 && JoinCID != BaseCID) { JoinCom(NID, JoinCID); }
      if (LeaveCID > -1 && JoinCID > -1 && JoinCID != BaseCID && LeaveCID != BaseCID) { SwitchCnt++; }
      else if (LeaveCID > -1  && LeaveCID != BaseCID) { LeaveCnt++;}
      else if (JoinCID > -1 && JoinCID != BaseCID) { JoinCnt++;}
      AcceptCnt++;
      if ((iter + 1) % EvalLambdaIter == 0) {
        IterTm.Tick();
        MLEGradAscentGivenCAG(0.01, 3);
        OptL = Likelihood();
      }
      else{
        OptL = PrevL + DeltaL;
      }
      if (BestL <= OptL && CIDNSetV.Len() > 0) {
        BestCmtySetV = CIDNSetV;
        BestLV = LambdaV;
        BestL = OptL;
      }
    }
    if (iter > 0 && (iter % ProbBinSz == 0) && PlotFPrx.Len() > 0) { 
      IterLBV.Add(TIntFltPr(iter, OptL));
      IterSwitchV.Add(TIntFltPr(iter, (double) SwitchCnt / (double) AcceptCnt));
      IterLeaveV.Add(TIntFltPr(iter, (double) LeaveCnt / (double) AcceptCnt));
      IterJoinV.Add(TIntFltPr(iter, (double) JoinCnt / (double) AcceptCnt));
      IterAcceptV.Add(TIntFltPr(iter, (double) AcceptCnt / (double) ProbBinSz));
      SwitchCnt = JoinCnt = LeaveCnt = AcceptCnt = 0;
    }
    PrevL = OptL;
    if ((iter + 1) % 10000 == 0) {
      printf("\r%d iterations completed [%.2f]", iter, (double) iter / (double) MaxIter);
    }
  }
  
  // plot the likelihood and acceptance probabilities if the plot file name is given
  if (PlotFPrx.Len() > 0) {
    TGnuPlot GP1;
    GP1.AddPlot(IterLBV, gpwLinesPoints, "likelihood");
    GP1.SetDataPlotFNm(PlotFPrx + ".likelihood.tab", PlotFPrx + ".likelihood.plt");
    TStr TitleStr = TStr::Fmt(" N:%d E:%d", Nodes, Edges);
    GP1.SetTitle(PlotFPrx + ".likelihood" + TitleStr);
    GP1.SavePng(PlotFPrx + ".likelihood.png");

    TGnuPlot GP2;
    GP2.AddPlot(IterSwitchV, gpwLinesPoints, "Switch");
    GP2.AddPlot(IterLeaveV, gpwLinesPoints, "Leave");
    GP2.AddPlot(IterJoinV, gpwLinesPoints, "Join");
    GP2.AddPlot(IterAcceptV, gpwLinesPoints, "Accept");
    GP2.SetTitle(PlotFPrx + ".transition");
    GP2.SetDataPlotFNm(PlotFPrx + "transition_prob.tab", PlotFPrx + "transition_prob.plt");
    GP2.SavePng(PlotFPrx + "transition_prob.png");
  }
  CIDNSetV = BestCmtySetV;
  LambdaV = BestLV;

  InitNodeData();
  MLEGradAscentGivenCAG(0.001, 100);
  printf("\nMCMC completed (best likelihood: %.2f) [%s]\n", BestL, TotalTm.GetTmStr());
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::SampleTransition	(	int &	NID,
		int &	JoinCID,
		int &	LeaveCID,
		double &	DeltaL
	)

Sample MMCM transitions: Choose among (join, leave, switch), and then sample (NID, CID).

Definition at line 325 of file agmfit.cpp.

References BaseCID, CIDNSetV, G, THash< TKey, TDat, THashFunc >::GetDat(), THash< TKey, TDat, THashFunc >::GetKey(), TUNGraph::GetNodes(), THash< TKey, TDat, THashFunc >::GetRndKeyId(), TUNGraph::GetRndNId(), TRnd::GetUniDevInt(), THash< TKey, TDat, THashFunc >::IsKey(), THash< TKey, TDat, THashFunc >::Len(), TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), TFlt::Mn, NIDCIDPrS, NIDComVH, Rnd, SeekJoin(), SeekLeave(), SeekSwitch(), TPair< TVal1, TVal2 >::Val1, and TPair< TVal1, TVal2 >::Val2.

Referenced by RunMCMC().

                                                                                    {
  int Option = Rnd.GetUniDevInt(3); //0:Join 1:Leave 2:Switch
  if (NIDCIDPrS.Len() <= 1) {    Option = 0;  } //if there is only one node membership, only join is possible.
  int TryCnt = 0;
  const int MaxTryCnt = G->GetNodes();
  DeltaL = TFlt::Mn;
  if (Option == 0) {
    do {
      JoinCID = Rnd.GetUniDevInt(CIDNSetV.Len());
      NID = G->GetRndNId();
    } while (TryCnt++ < MaxTryCnt && NIDCIDPrS.IsKey(TIntPr(NID, JoinCID)));
    if (TryCnt < MaxTryCnt) { //if successfully find a move
      DeltaL = SeekJoin(NID, JoinCID);
    }
  }
  else if (Option == 1) {
    do {
      TIntPr NIDCIDPr = NIDCIDPrS.GetKey(NIDCIDPrS.GetRndKeyId(Rnd));
      NID = NIDCIDPr.Val1;
      LeaveCID = NIDCIDPr.Val2;
    } while (TryCnt++ < MaxTryCnt && LeaveCID == BaseCID);
    if (TryCnt < MaxTryCnt) {//if successfully find a move
      DeltaL = SeekLeave(NID, LeaveCID);
    }
  }
  else{
    do {
      TIntPr NIDCIDPr = NIDCIDPrS.GetKey(NIDCIDPrS.GetRndKeyId(Rnd));
      NID = NIDCIDPr.Val1;
      LeaveCID = NIDCIDPr.Val2;
    } while (TryCnt++ < MaxTryCnt && (NIDComVH.GetDat(NID).Len() == CIDNSetV.Len() || LeaveCID == BaseCID));
    do {
      JoinCID = Rnd.GetUniDevInt(CIDNSetV.Len());
    } while (TryCnt++ < G->GetNodes() && NIDCIDPrS.IsKey(TIntPr(NID, JoinCID)));
    if (TryCnt < MaxTryCnt) {//if successfully find a move
      DeltaL = SeekSwitch(NID, LeaveCID, JoinCID);
    }
  }
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::Save ( TSOut & SOut )

Definition at line 8 of file agmfit.cpp.

References BaseCID, CIDNSetV, ComEdgesV, EdgeComVH, G, LambdaV, MaxLambda, MinLambda, NIDCIDPrH, NIDCIDPrS, NIDComVH, PNoCom, RegCoef, THash< TKey, TDat, THashFunc >::Save(), TUNGraph::Save(), TVec< TVal, TSizeTy >::Save(), TInt::Save(), and TFlt::Save().

                              {
  G->Save(SOut);
  CIDNSetV.Save(SOut);
  EdgeComVH.Save(SOut);
  NIDComVH.Save(SOut);
  ComEdgesV.Save(SOut);
  PNoCom.Save(SOut);
  LambdaV.Save(SOut);
  NIDCIDPrH.Save(SOut);
  NIDCIDPrS.Save(SOut);
  MinLambda.Save(SOut);
  MaxLambda.Save(SOut);
  RegCoef.Save(SOut);
  BaseCID.Save(SOut);
}

Here is the call graph for this function:

double TAGMFit::SeekJoin	(	const int &	UID,
		const int &	CID
	)

Compute the change in likelihood (Delta) if node UID joins community CID.

Definition at line 502 of file agmfit.cpp.

References CIDNSetV, EdgeComVH, G, THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), IAssert, THashSet< TKey, THashFunc >::IsKey(), LambdaV, TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), TMath::Mn(), TMath::Mx(), NIDComVH, PNoCom, and SelectLambdaSum().

Referenced by SampleTransition(), and SeekSwitch().

                                                       {
  IAssert(! CIDNSetV[CID].IsKey(UID));
  double Delta = 0.0;
  TUNGraph::TNodeI NI = G->GetNI(UID);
  int NbhsInC = 0;
  for (int e = 0; e < NI.GetDeg(); e++) {
    const int VID = NI.GetNbrNId(e);
    if (! NIDComVH.GetDat(VID).IsKey(CID)) { continue; }
    TIntPr SrcDstNIDPr(TMath::Mn(UID,VID), TMath::Mx(UID,VID));
    TIntSet& JointCom = EdgeComVH.GetDat(SrcDstNIDPr);
    double CurPuv, NewPuv, LambdaSum = SelectLambdaSum(JointCom);
    CurPuv = 1 - exp(- LambdaSum);
    if (JointCom.Len() == 0) { CurPuv = PNoCom; }
    NewPuv = 1 - exp(- LambdaSum - LambdaV[CID]);
    Delta += (log(NewPuv) - log(CurPuv));
    IAssert(!_isnan(Delta));
    NbhsInC++;
  }
  Delta -= LambdaV[CID] * (CIDNSetV[CID].Len() - NbhsInC);
  return Delta;
}

Here is the call graph for this function:

Here is the caller graph for this function:

double TAGMFit::SeekLeave	(	const int &	UID,
		const int &	CID
	)

Compute the change in likelihood (Delta) if node UID leaves community CID.

Definition at line 475 of file agmfit.cpp.

References CIDNSetV, EdgeComVH, G, THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), IAssert, THashSet< TKey, THashFunc >::IsKey(), TUNGraph::IsNode(), LambdaV, TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), TMath::Mn(), TMath::Mx(), NIDComVH, PNoCom, and SelectLambdaSum().

Referenced by SampleTransition(), and SeekSwitch().

                                                        {
  IAssert(CIDNSetV[CID].IsKey(UID));
  IAssert(G->IsNode(UID));
  double Delta = 0.0;
  TUNGraph::TNodeI NI = G->GetNI(UID);
  int NbhsInC = 0;
  for (int e = 0; e < NI.GetDeg(); e++) {
    const int VID = NI.GetNbrNId(e);
    if (! NIDComVH.GetDat(VID).IsKey(CID)) { continue; }
    TIntPr SrcDstNIDPr(TMath::Mn(UID,VID), TMath::Mx(UID,VID));
    TIntSet& JointCom = EdgeComVH.GetDat(SrcDstNIDPr);
    double CurPuv, NewPuv, LambdaSum = SelectLambdaSum(JointCom);
    CurPuv = 1 - exp(- LambdaSum);
    NewPuv = 1 - exp(- LambdaSum + LambdaV[CID]);
    IAssert(JointCom.Len() > 0);
    if (JointCom.Len() == 1) {
      NewPuv = PNoCom;
    }
    Delta += (log(NewPuv) - log(CurPuv));
    IAssert(!_isnan(Delta));
    NbhsInC++;
  }
  Delta += LambdaV[CID] * (CIDNSetV[CID].Len() - 1 - NbhsInC);
  return Delta;
}

Here is the call graph for this function:

Here is the caller graph for this function:

double TAGMFit::SeekSwitch	(	const int &	UID,
		const int &	CurCID,
		const int &	NewCID
	)

Definition at line 525 of file agmfit.cpp.

References CIDNSetV, EdgeComVH, G, THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), IAssert, THashSet< TKey, THashFunc >::IsKey(), LambdaV, THashSet< TKey, THashFunc >::Len(), TMath::Mn(), TMath::Mx(), NIDComVH, PNoCom, SeekJoin(), SeekLeave(), SelectLambdaSum(), and TFlt::Val.

Referenced by SampleTransition().

                                                                               {
  IAssert(! CIDNSetV[NewCID].IsKey(UID));
  IAssert(CIDNSetV[CurCID].IsKey(UID));
  double Delta = SeekJoin(UID, NewCID) + SeekLeave(UID, CurCID);
  //correct only for intersection between new com and current com
  TUNGraph::TNodeI NI = G->GetNI(UID);
  for (int e = 0; e < NI.GetDeg(); e++) {
    const int VID = NI.GetNbrNId(e);
    if (! NIDComVH.GetDat(VID).IsKey(CurCID) || ! NIDComVH.GetDat(VID).IsKey(NewCID)) {continue;}
    TIntPr SrcDstNIDPr(TMath::Mn(UID,VID), TMath::Mx(UID,VID));
    TIntSet& JointCom = EdgeComVH.GetDat(SrcDstNIDPr);
    double CurPuv, NewPuvAfterJoin, NewPuvAfterLeave, NewPuvAfterSwitch, LambdaSum = SelectLambdaSum(JointCom);
    CurPuv = 1 - exp(- LambdaSum);
    NewPuvAfterLeave = 1 - exp(- LambdaSum + LambdaV[CurCID]);
    NewPuvAfterJoin = 1 - exp(- LambdaSum - LambdaV[NewCID]);
    NewPuvAfterSwitch = 1 - exp(- LambdaSum - LambdaV[NewCID] + LambdaV[CurCID]);
    if (JointCom.Len() == 1 || NewPuvAfterLeave == 0.0) {
      NewPuvAfterLeave = PNoCom;
    }
    Delta += (log(NewPuvAfterSwitch) + log(CurPuv) - log(NewPuvAfterLeave) - log(NewPuvAfterJoin));
    if (_isnan(Delta)) {
      printf("NS:%f C:%f NL:%f NJ:%f PNoCom:%f", NewPuvAfterSwitch, CurPuv, NewPuvAfterLeave, NewPuvAfterJoin, PNoCom.Val);
    }
    IAssert(!_isnan(Delta));
  }
  return Delta;
}

Here is the call graph for this function:

Here is the caller graph for this function:

double TAGMFit::SelectLambdaSum ( const TIntSet & ComK )

Compute sum of lambda_c (which is log (1 - p_c)) over C_uv (ComK). The function is used to compute edge probability P_uv.

Definition at line 618 of file agmfit.cpp.

References LambdaV.

Referenced by GradLogLForLambda(), Likelihood(), SeekJoin(), SeekLeave(), and SeekSwitch().

                                                   { 
  return SelectLambdaSum(LambdaV, ComK); 
}

Here is the caller graph for this function:

double TAGMFit::SelectLambdaSum	(	const TFltV &	NewLambdaV,
		const TIntSet &	ComK
	)

COMMENT.

Definition at line 622 of file agmfit.cpp.

References THashSet< TKey, THashFunc >::BegI(), THashSet< TKey, THashFunc >::EndI(), and IAssert.

                                                                            {
  double Result = 0.0;
  for (TIntSet::TIter SI = ComK.BegI(); SI < ComK.EndI(); SI++) {
    IAssert(NewLambdaV[SI.GetKey()] >= 0);
    Result += NewLambdaV[SI.GetKey()];
  }
  return Result;
}

Here is the call graph for this function:

void TAGMFit::SetCmtyVV ( const TVec< TIntV > & CmtyVV )

COMMENT.

Definition at line 584 of file agmfit.cpp.

References CIDNSetV, G, TVec< TVal, TSizeTy >::Gen(), IAssert, InitNodeData(), TUNGraph::IsNode(), TVec< TVal, TSizeTy >::Len(), and SetDefaultPNoCom().

Referenced by AddBaseCmty(), RandomInitCmtyVV(), and TAGMFit().

                                                 {
  CIDNSetV.Gen(CmtyVV.Len());
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    CIDNSetV[c].AddKeyV(CmtyVV[c]);
    for (int j = 0; j < CmtyVV[c].Len(); j++) { IAssert(G->IsNode(CmtyVV[c][j])); } // check whether the member nodes exist in the graph
  }
  InitNodeData();
  SetDefaultPNoCom();
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::SetDefaultPNoCom ( )

Set Epsilon (the probability that two nodes sharing no communities connect) to the default value.

Definition at line 72 of file agmfit.cpp.

References G, TUNGraph::GetNodes(), and PNoCom.

Referenced by NeighborComInit(), RandomInit(), and SetCmtyVV().

                               {
  PNoCom = 1.0 / (double) G->GetNodes() / (double) G->GetNodes();
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::SetLambdaV ( const TFltV & LambdaPt ) [inline]

COMMENT.

Definition at line 91 of file agmfit.h.

References LambdaV.

{LambdaV = LambdaPt;}

void TAGMFit::SetPNoCom ( const double & Epsilon ) [inline]

Set Epsilon (the probability that two nodes sharing no communities connect) to the default value.

Definition at line 57 of file agmfit.h.

References BaseCID, and PNoCom.

Referenced by TAGMUtil::FindComsByAGM().

{ if (BaseCID == -1 && Epsilon > 0.0) { PNoCom = Epsilon; } }

Here is the caller graph for this function:

void TAGMFit::SetRegCoef ( const double Val ) [inline]

Definition at line 45 of file agmfit.h.

References RegCoef.

Referenced by TAGMUtil::FindComsByAGM().

{ RegCoef = Val; }

Here is the caller graph for this function:

Member Data Documentation

TInt TAGMFit::BaseCID [private]

ID of the Epsilon-community (in case we fit P_c of the epsilon community). We do not fit for the Epsilon-community in general.

Definition at line 23 of file agmfit.h.

Referenced by AddBaseCmty(), GetCmtyVV(), Load(), RunMCMC(), SampleTransition(), Save(), and SetPNoCom().

TVec<TIntSet> TAGMFit::CIDNSetV [private]

Community ID -> Member Node ID Sets.

Definition at line 11 of file agmfit.h.

Referenced by AddBaseCmty(), GetCmtyVV(), GetEdgeJointCom(), GradLogLForLambda(), InitNodeData(), JoinCom(), LeaveCom(), Likelihood(), Load(), NeighborComInit(), PrintSummary(), RandomInit(), RemoveEmptyCom(), RunMCMC(), SampleTransition(), Save(), SeekJoin(), SeekLeave(), SeekSwitch(), and SetCmtyVV().

TIntV TAGMFit::ComEdgesV [private]

The number of edges in each community.

Definition at line 14 of file agmfit.h.

Referenced by GetEdgeJointCom(), GradLogLForLambda(), InitNodeData(), JoinCom(), LeaveCom(), Likelihood(), Load(), PrintSummary(), and Save().

THash<TIntPr,TIntSet> TAGMFit::EdgeComVH [private]

Edge -> Shared Community ID Set.

Definition at line 12 of file agmfit.h.

Referenced by GetEdgeJointCom(), GradLogLForLambda(), JoinCom(), LeaveCom(), Likelihood(), Load(), Save(), SeekJoin(), SeekLeave(), and SeekSwitch().

PUNGraph TAGMFit::G [private]

Graph to fit.

Definition at line 10 of file agmfit.h.

Referenced by AddBaseCmty(), CalcPNoComByCmtyVV(), GetCmtyVV(), GetEdgeJointCom(), InitNodeData(), JoinCom(), LeaveCom(), Load(), MLEGradAscentGivenCAG(), NeighborComInit(), RandomInit(), RandomInitCmtyVV(), RunMCMC(), SampleTransition(), Save(), SeekJoin(), SeekLeave(), SeekSwitch(), SetCmtyVV(), and SetDefaultPNoCom().

TFltV TAGMFit::LambdaV [private]

Parametrization of P_c (edge probability in community c), P_c = 1 - exp(-lambda).

Definition at line 16 of file agmfit.h.

Referenced by GetCmtyVV(), GetLambdaV(), GetQV(), GetStepSizeByLineSearchForLambda(), GradLogLForLambda(), InitNodeData(), Likelihood(), Load(), MLEGradAscentGivenCAG(), PrintSummary(), RemoveEmptyCom(), RunMCMC(), Save(), SeekJoin(), SeekLeave(), SeekSwitch(), SelectLambdaSum(), and SetLambdaV().

TFlt TAGMFit::MaxLambda [private]

Maximum value of regularization parameter lambda (default = 10).

Definition at line 21 of file agmfit.h.

Referenced by GetStepSizeByLineSearchForLambda(), InitNodeData(), Load(), MLEGradAscentGivenCAG(), and Save().

TFlt TAGMFit::MinLambda [private]

Minimum value of regularization parameter lambda (default = 1e-5).

Definition at line 20 of file agmfit.h.

Referenced by GetCmtyVV(), GetStepSizeByLineSearchForLambda(), InitNodeData(), Load(), MLEGradAscentGivenCAG(), and Save().

THash<TIntPr,TFlt> TAGMFit::NIDCIDPrH [private]

<Node ID, Community ID> pairs (for sampling MCMC moves).

Definition at line 18 of file agmfit.h.

Referenced by Load(), and Save().

THash<TIntPr,TInt> TAGMFit::NIDCIDPrS [private]

<Node ID, Community ID> pairs (for sampling MCMC moves).

Definition at line 19 of file agmfit.h.

Referenced by InitNodeData(), JoinCom(), LeaveCom(), Load(), PrintSummary(), SampleTransition(), and Save().

THash<TInt, TIntSet> TAGMFit::NIDComVH [private]

Node ID -> Communitie IDs the node belongs to.

Definition at line 13 of file agmfit.h.

Referenced by CalcPNoComByCmtyVV(), GetEdgeJointCom(), InitNodeData(), JoinCom(), LeaveCom(), Load(), SampleTransition(), Save(), SeekJoin(), SeekLeave(), and SeekSwitch().

TFlt TAGMFit::PNoCom [private]

Probability of edge when two nodes share no community (epsilon in the paper).

Definition at line 15 of file agmfit.h.

Referenced by AddBaseCmty(), CalcPNoComByCmtyVV(), GetPNoCom(), GradLogLForLambda(), Likelihood(), Load(), PrintSummary(), Save(), SeekJoin(), SeekLeave(), SeekSwitch(), SetDefaultPNoCom(), and SetPNoCom().

TFlt TAGMFit::RegCoef [private]

Regularization parameter when we fit for P_c (for finding # communities).

Definition at line 22 of file agmfit.h.

Referenced by GradLogLForLambda(), Likelihood(), Load(), Save(), and SetRegCoef().

TRnd TAGMFit::Rnd [private]

Definition at line 17 of file agmfit.h.

Referenced by Load(), RandomInit(), RandomInitCmtyVV(), RunMCMC(), and SampleTransition().

The documentation for this class was generated from the following files:

snap-adv/agmfit.h
snap-adv/agmfit.cpp

Public Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation